Rank | Count | Beginning |
---|---|---|
37156 | 3815 | Hän |
213547 | 3249 | Se |
169042 | 3197 | Olet |
67787 | 3071 | Jos |
54687 | 2616 | Ilmoita |
163593 | 2496 | Nyt |
17896 | 2353 | Ei |
102038 | 2305 | Kun |
107201 | 2294 | Kuva: |
60611 | 1984 | Ja |
149924 | 1967 | Myös |
242191 | 1852 | Tämä |
146505 | 1760 | Mutta |
215934 | 1714 | Sen |
174518 | 1604 | On |
122469 | 1482 | Lisäksi |
254276 | 1243 | Tilaa |
152828 | 1226 | Näin |
232978 | 1023 | Suomen |
37817 | 1001 | Hänen |
286423 | 961 | Viime |
274860 | 926 | Vaikka |
29226 | 883 | Esimerkiksi |
23768 | 876 | En |
168084 | 796 | Olen |
76905 | 764 | Kaikki |
110343 | 734 | Kyllä |
194456 | 729 | Poliisi |
140768 | 718 | Mitä |
156361 | 692 | Ne |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV